oriented recursive partition
Ranking Data with Continuous Labels through Oriented Recursive Partitions
We formulate a supervised learning problem, referred to as continuous ranking, where a continuous real-valued label Y is assigned to an observable r.v. X taking its values in a feature space X and the goal is to order all possible observations x in X by means of a scoring function s: X R so that s(X) and Y tend to increase or decrease together with highest probability. This problem generalizes bi/multi-partite ranking to a certain extent and the task of finding optimal scoring functions s(x) can be naturally cast as optimization of a dedicated functional criterion, called the IROC curve here, or as maximization of the Kendall τ related to the pair (s(X), Y). From the theoretical side, we describe the optimal elements of this problem and provide statistical guarantees for empirical Kendall τ maximization under appropriate conditions for the class of scoring function candidates. We also propose a recursive statistical learning algorithm tailored to empirical IROC curve optimization and producing a piecewise constant scoring function that is fully described by an oriented binary tree. Preliminary numerical experiments highlight the difference in nature between regression and continuous ranking and provide strong empirical evidence of the performance of empirical optimizers of the criteria proposed.
Reviews: Ranking Data with Continuous Labels through Oriented Recursive Partitions
The ms tries to solve a problem called continuous ranking, which is an extension of bipartite ranking. The idea of continuous ranking is to find a score function that increase or decrease with output y with highest probability. In bipartite ranking, each data point x is given a binary label y \in { 1, -1}. The goal is to find a score function s(x) such that the difference of p(s(x) t y 1) and p(s(x) t y -1) is maximal for all threshold t. This can be achieved by maximizing AUC of the score function. The ms extends the bipartite ranking idea for continuous output Y, by introducing a threshold y to separate the data into two ordered parts.
Ranking Data with Continuous Labels through Oriented Recursive Partitions
Clémençon, Stéphan, Achab, Mastane
We formulate a supervised learning problem, referred to as continuous ranking, where a continuous real-valued label Y is assigned to an observable r.v. X taking its values in a feature space X and the goal is to order all possible observations x in X by means of a scoring function s: X R so that s(X) and Y tend to increase or decrease together with highest probability. This problem generalizes bi/multi-partite ranking to a certain extent and the task of finding optimal scoring functions s(x) can be naturally cast as optimization of a dedicated functional cri- terion, called the IROC curve here, or as maximization of the Kendall τ related to the pair (s(X), Y). From the theoretical side, we describe the optimal elements of this problem and provide statistical guarantees for empirical Kendall τ maximiza- tion under appropriate conditions for the class of scoring function candidates. We also propose a recursive statistical learning algorithm tailored to empirical IROC curve optimization and producing a piecewise constant scoring function that is fully described by an oriented binary tree.